---
title: Token Filters
---

Token filters apply additional processing to tokens after they have been created.

## Stemmer

Stemming is the process of reducing words to their root form. In English, for example, the root form of `running` and `runs` is `run`. The `stemmer` filter can
be applied to any tokenizer.

```sql
CREATE INDEX search_idx ON mock_items
USING bm25 (id, description)
WITH (
    key_field='id',
    text_fields='{
        "description": {"tokenizer": {"type": "default", "stemmer": "English"}}
    }'
);
```

<ParamField body="stemmer">
  Available stemmers are `Arabic`, `Danish`, `Dutch`, `English`, `Finnish`,
  `French`, `German`, `Greek`, `Hungarian`, `Italian`, `Norwegian`,
  `Portuguese`, `Romanian`, `Russian`, `Spanish`, `Swedish`, `Tamil`, and
  `Turkish`.
</ParamField>

## Remove Long

The `remove_long` filter removes all tokens longer than a fixed number of bytes. If not specified,
`remove_long` defaults to `255`.

```sql
CREATE INDEX search_idx ON mock_items
USING bm25 (id, description)
WITH (
    key_field='id',
    text_fields='{
        "description": {"tokenizer": {"type": "default", "remove_long": 255}}
    }'
);
```

## Lowercase

The `lowercase` filter lowercases all tokens. If not specified, `lowercase` defaults to `true`.

```sql
CREATE INDEX search_idx ON mock_items
USING bm25 (id, description)
WITH (
    key_field='id',
    text_fields='{
        "description": {"tokenizer": {"type": "default", "lowercase": false}}
    }'
);
```

## Stopwords Language

<Note>This filter is not supported for the ngram tokenizer.</Note>

`stopwords_language` removes common "stop words" for a specific language from the original text before tokenization.

```sql
CREATE INDEX search_idx ON mock_items
USING bm25 (id, description)
WITH (
    key_field='id',
    text_fields='{
        "description": {"tokenizer": {"type": "default", "stopwords_language": "English"}}
    }'
);
```

<ParamField body="stopwords_language">
  Available languages are `Danish`, `Dutch`, `English`, `Finnish`, `French`,
  `German`, `Hungarian`, `Italian`, `Norwegian`, `Portuguese`, `Russian`,
  `Spanish`, and `Swedish`.
</ParamField>

## Custom Stopwords

<Note>This filter is not supported for the ngram tokenizer.</Note>

`stopwords` removes custom words from the original text before tokenization.

```sql
CREATE INDEX search_idx ON mock_items
USING bm25 (id, description)
WITH (
    key_field='id',
    text_fields='{
        "description": {"tokenizer": {"type": "default", "stopwords": ["shoes", "boots"]}}
    }'
);
```
